Method and apparatus for reducing library size on a compressed file system

ABSTRACT

A method, apparatus, and computer instructions for managing a library in a data processing system that supports file compression. Functions in the library that are unrequired on the data processing system are identified to form a set of identified functions. Existing data for the set of identified functions in the library is overwritten with new data that is more compressible than the code, wherein compression of the library by the data processing system results in a library with a smaller size.

BACKGROUND OF THE INVENTION

[0001] 1. Technical Field

[0002] The present invention relates to an improved data processing system, and in particular to a method, apparatus, and computer instructions for managing libraries. Still more particularly, the present invention provides a method, apparatus, and computer instructions for managing libraries on a compressed file system.

[0003] 2. Description of Related Art

[0004] Data processing systems come in many different forms. Some data processing systems take the form of web servers and mainframes. On the other end of the spectrum are embedded computer system environments. Embedded systems typically do not take the form of a desktop computer. An embedded system is a combination of hardware and software that performs a specific function. This embedded system may be part of a computer or part of some other system that is not a computer. For example, an embedded system may be present in various devices, such as routers, biomedical appliances, cellular phones, personal digital assistants (PDAs), cameras, and camcorders.

[0005] In an embedded computer system environment, storage space is very limited. As a result, it is important to reduce the size of programs and data stored on those systems. File systems on embedded systems often use compression to save space. A file system is a mechanism for cataloging files in a computer.

[0006] One of the major consumers of space is the set of standard libraries, which provide a vast array of generic functions for applications. It is not unusual for these types of libraries to contain thousands of functions. Many of these functions, however, are not applicable, or are simply not used in a specific embedded system. As a result, a large amount of wasted space occurs in using these libraries.

[0007] Some techniques are currently available to minimize the waste of space with respect to libraries. These existing techniques include creating or rebuilding the library with only the required functions being included in the library. Another technique involves physically removing the unused functions in a library, compacting the library, and then readjusting a symbol table for the library.

[0008] Although these two techniques reduce the space required for a library, they both have drawbacks. Rebuilding a library requires that the source code for the library functions be available. In many cases, source codes for libraries are not provided or are hard to find. Having the source code available also requires that the library source code be kept under source code control to maintain quality control. Rebuilding also requires additional compilation time. The second technique requires readjustment of the symbol table or tables to accommodate the new locations of the remaining functions. Although such a task can be accomplished, this process is cumbersome, error prone and makes debugging more complex.

[0009] Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for managing a library in a manner to reduce the space required to store a library in a data processing system.

SUMMARY OF THE INVENTION

[0010] The present invention provides a method, apparatus, and computer instructions for managing a library in a data processing system that supports file compression. Functions in the library that are unrequired on the data processing system, are identified to form a set of identified functions. Existing data for the set of identified functions in the library is overwritten with new data that is more compressible than the code, wherein compression of the library by the data processing system results in a library with a smaller size.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:

[0012]FIG. 1 is a diagram of an embedded computing system in the form of a personal digital assistant (PDA) in accordance with a preferred embodiment of the present invention;

[0013]FIG. 2 is a block diagram of a PDA in accordance with a preferred embodiment of the present invention;

[0014]FIG. 3 is a diagram illustrating an example library before and after compression;

[0015]FIG. 4 is a diagram illustrating the use of an existing technique to reduce library size;

[0016]FIG. 5 is a diagram illustrating a technique for reducing library sizes in accordance with a preferred embodiment of the present invention; and

[0017]FIG. 6 is a flowchart of a process for reducing the size of a library for a compressed file system in accordance with a preferred embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0018] With reference now to the figures, and in particular with reference to FIG. 1, a diagram of an embedded computing system in the form of a personal digital assistant (PDA) is depicted in accordance with a preferred embodiment of the present invention. PDA 100 includes a display 102 for presenting textual and graphical information. Display 102 may be a known display device, such as a liquid crystal display (LCD) device. The display may be used to present a map or directions, calendar information, a telephone directory, or an electronic mail message. In these examples, screen 102 may receive user input using an input device such as, for example, stylus 110.

[0019] PDA 100 may also include keypad 104, speaker 106, and antenna 108. Keypad 104 may be used to receive user input in addition to using screen 102. Speaker 106 provides a mechanism for audio output, such as presentation of an audio file. Antenna 108 provides a mechanism used in establishing a wireless communications link between PDA 100 and a network.

[0020] PDA 100 also preferably includes a graphical user interface that may be implemented by means of systems software residing in computer readable media in operation within PDA 100.

[0021] Turning now to FIG. 2, a block diagram of a PDA is shown in accordance with a preferred embodiment of the present invention. PDA 200 is an example of a PDA, such as PDA 100 in FIG. 1, in which code or instructions implementing the processes of the present invention may be located. PDA 200 includes a bus 202 to which processor 204 and main memory 206 are connected. Display adapter 208, keypad adapter 210, storage 212, and audio adapter 214 also are connected to bus 202. Cradle link 216 provides a mechanism to connect PDA 200 to a cradle used in synchronizing data in PDA 200 with another data processing system. Further, display adapter 208 also includes a mechanism to receive user input from a stylus when a touch screen display is employed.

[0022] An operating system runs on processor 204 and is used to coordinate and provide control of various components within PDA 200 in FIG. 2. The operating system may be, for example, a commercially available operating system such as Windows CE, which is available from Microsoft Corporation. Instructions for the operating system and applications or programs are located on storage devices, such as storage 212, and may be loaded into main memory 206 for execution by processor 204.

[0023] Those of ordinary skill in the art will appreciate that the hardware in FIG. 2 may vary depending on the implementation. Other internal hardware or peripheral devices, such as flash ROM (or equivalent nonvolatile memory) or optical disk drives and the like, may be used in addition to or in place of the hardware depicted in FIG. 2.

[0024] Further, the mechanism of the present invention may be implemented in other types of embedded computing systems, other than PDA 200. The illustration of a PDA is made for the purpose of providing an example hardware platform in which the mechanism of the present invention may be implemented. The present invention may be implemented on other embedded computer systems, such as, for example, a cellular phone, a camera, a printer, a router, or a set-top box. Additionally, the mechanism of the present invention may be applied to other data processing systems in which a compressed file system is present.

[0025] With reference to FIG. 3, a diagram illustrating an example library before and after compression is shown. In this example, library 300 contains functions A 302, B 304, C 306, and D 308. Additionally, library 300 also includes symbol table 310. After compression by the file system, compressed library 312 is formed, which has a smaller size than library 300, as illustrated in FIG. 3. As can be seen, all of the same functions found in library 300 are still present within compressed library 312. The size of library 300 has been reduced through compression performed by the file system.

[0026] Turning now to FIG. 4, a diagram illustrating the use of an existing technique to reduce library size is illustrated. Although library 300 was reduced in size, as shown in FIG. 3 through compression, further reductions can be had by removing functions or rebuilding the library. Through rebuilding or removing functions, library 400 is created, in which library 400 contains function A 302 and function C 306, along with symbol table 310. When library 400 is compressed, compressed library 402 is formed, which has a smaller size than library 400. As can be seen, gains in the amount of size reduced may be accomplished through rebuilding the library or removing functions. These techniques have the complexity and cumbersomeness as described above.

[0027] The present invention overcomes the complexity and cumbersomeness of the presently available techniques because rebuilding of the library or removal of the functions do not occur. The present invention takes advantage of compression already used by such a data processing system by replacing unused functions with data that compresses extremely well. For example, a run or a string containing a repetition of a single byte value may be used to replace the data for the unrequired function. For example, a repeating series of binary zeros may be used. A repeating run of a letter “a” may be used depending on the particular implementation. This mechanism does not require readjustment of the symbol table because functions are not removed.

[0028] Turning now to FIG. 5, a diagram illustrating a technique for reducing library sizes is depicted in accordance with a preferred embodiment of the present invention. In this example, library 300 is modified to form modified library 500. As with library 400, function A 302 and function C 306 are required functions, while function B 304 and function D 308 are unrequired functions in the library. In this example, the data for function B 304 and function D 308 are replaced with easily compressible data 502 and 504.

[0029] In these examples, easily compressible data 502 and 504 may take the form of a string or run of a repeating single byte value. For example, binary zeros may be used for easily compressible data 502 and 504. Of course, any consistent value may be employed in these examples. After compression by the file system, compressed library 506 is formed having a size that is smaller than compressed library 312. The normal compression ratio is a little over 3 to 1 with a library that is not modified. When a library is modified according to the present invention, such as with a series of binary zeros the compression ratio reaches a ratio of about 1000 to 1 for those areas in which functions are replaced with the binary zeros. The amount of compression also will depend on the type of compression algorithm used.

[0030] With reference now to FIG. 6, a flowchart of a process for reducing the size of a library for a compressed file system is depicted in accordance with a preferred embodiment of the present invention. The process illustrated in FIG. 6 may be implemented in a data processing system, such as PDA 200 in FIG. 2 or on a data processing system, such as a desk top computer or a work station. Further, this process may be implemented on a data processing system in which file compression does not occur with the modified library being saved to a system on which a compressed file system is present.

[0031] The process begins by identifying names of functions not required (step 600). Thereafter, addresses of identified functions are located (step 602). Addresses may be found inside the symbol table. After the addresses have been located, the data for the identified functions are overwritten with new and more compressible data (step 604). Thereafter, the library is saved in the compressed file system (step 606), with the process terminating thereafter.

[0032] Thus, the present invention provides a method, apparatus, and computer instructions for library size reduction on a compressed file system. The mechanism of the present invention allows for increased compression of libraries without requiring the cumbersome and complex processes involved with creating or rebuilding a library with only the required functions or with physically removing unused functions within a library. The mechanism of the present invention provides this advantage by replacing the data for unrequired functions with easily compressible data. In this manner, readjustment of the symbol table is not required and by providing data that is easily compressible, the library is reduced in size by taking advantage of the compression system already present for the file system.

[0033] It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMs, DVD-ROMs, and transmission-type media, such as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular data processing system.

[0034] The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated. 

What is claimed is:
 1. A method in a data processing system for managing a library, the method comprising: identifying functions in the library that are unrequired on the data processing system to form a set of identified functions; and overwriting existing data for the set of identified functions in the library with new data that is more compressible than the code to form a modified library, wherein compression of the library by the data processing system results in a compressed library with a smaller size.
 2. The method of claim 1, wherein new data is a run of a single byte value.
 3. The method of claim 1, wherein the new data is a string of binary 0's.
 4. The method of claim 1, wherein the new data is a string of binary 1's.
 5. The method of claim 1, wherein the identifying step includes: identifying names of functions unrequired on the data processing system to form the set of identified functions; and locating addresses and lengths for functions in the set of identified functions.
 6. The method of claim 5, wherein the address and the lengths are located by looking in a symbol table for the library.
 7. The method of claim 1, wherein the data processing system is selected from one of a personal digital assistant, camera, cellular phone, biomedical appliance, set-top box, or desktop computer.
 8. The method of claim 1, wherein the modified library is stored on an embedded computer system that supports file compression.
 9. A data processing system for managing a library, the data processing system comprising: identifying means for identifying functions in the library that are unrequired on the data processing system to form a set of identified functions; and overwriting means for overwriting existing data for the set of identified functions in the library with new data that is more compressible than the code, wherein compression of the library by the data processing system results in a library with a smaller size.
 10. The data processing system of claim 9, wherein new data is a run of a single byte value.
 11. The data processing system of claim 9, wherein the new data is a string of binary 0's.
 12. The data processing system of claim 9, wherein the new data is a string of binary 1's.
 13. The data processing system of claim 9, wherein the identifying means includes: first means for identifying names of functions unrequired on the data processing system to form the set of identified functions; and second means for locating addresses and lengths for functions in the set of identified functions.
 14. The method of claim 13, wherein the address and the lengths are located by looking in a symbol table for the library.
 15. A data processing system comprising: a bus system; a memory connected to the bus system, wherein the memory includes a set of instructions; and a processing unit connected to bus system, wherein the processing unit executes the set of instructions to identify functions in the library that are unrequited on the data processing system to form a set of identified functions; and overwrite existing data for the set of identified functions in the library with new data that is more compressible than the code to form a modified library, wherein compression of the library by the data processing system results in a compressed library with a smaller size.
 16. A computer program product in a computer readable medium for managing a library in a data processing system, the computer program product comprising: first instructions identifying functions in the library that are unrequired on the data processing system to form a set of identified functions; and second instructions for overwriting existing data for the set of identified functions in the library with new data that is more compressible than the code to form a modified library, wherein compression of the library by the data processing system results in a compressed library with a smaller size than the library.
 17. The computer program product of claim 16, wherein new data is a run of a single byte value.
 18. The computer program product of claim 16, wherein the new data is a string of binary 0's.
 19. The computer program product of claim 16, wherein the new data is a string of binary 1's.
 20. The computer program product of claim 16, wherein the first instructions include: first sub-instructions for identifying names of functions unrequired on the data processing system to form the set of identified functions; and second sub-instructions for locating addresses and lengths for functions in the set of identified functions.
 21. The computer program product of claim 20, wherein the address and the lengths are located by looking in a symbol table for the library. 